Corpora at Linguateca: Vision and roads taken

نویسندگان

  • Diana Santos
  • Eckhard Bick
چکیده

In the late nineties, access to Portuguese data in electronic form was scarce, and was considered one of the bottlenecks limiting the advance of natural language processing of Portuguese (Santos, 1999a), so Linguateca’s launching of AC/DC i had as purpose to significantly increase the amount of data – and its quality, in that the data was annotated and classified. To the best of my knowledge, AC/DC was the first service on the Web to provide free and unencumbered access to a set of Portuguese-language materials for linguists wanting to conduct research on Portuguese.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gramateca: Corpus-Based Grammar of Portuguese

Eckhard Bick. 2000. The Parsing System "Palavras": Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press. Diana Santos. 2014. Corpora at Linguateca: Vision and roads taken. In Tony Berber Sardinha & Telma de Lurdes São Bento Ferreira (eds.), Working with Portuguese Corpora, Bloomsbury, 2014, pp. 219-236. Diana Santos. 2014. Podemos contar com as...

متن کامل

The Corpógrafo - a Web-based Environment for Corpora Research

In this paper we present the Corpógrafo, an integrated web-based environment for corpus linguistics and knowledge engineering that is being developed at the Porto node of Linguateca. The Corpógrafo aims to provide an integrated corpora research environment by making freely available on the web a comprehensive set of text and language tools (http://www.linguateca.pt/corpografo/). We describe the...

متن کامل

Corpógrafo – Applications

This paper will discuss how the Corpógrafo, a suite of on-line tools created by PoloCLUP of the Linguateca project (http://www.linguateca.pt) for the construction and analysis of corpora and the building of terminological databases, has been used for training professional linguists in corpora compilation, terminology extraction, terminology management and information retrieval. Reference will b...

متن کامل

Caminhos percorridos no mapa da portuguesificação: A Linguateca em perspectiva

Este artigo faz um balanço pessoal do percurso da Linguateca, uma organização virtual em demanda de uma maior facilidade e qualidade no processamento da ĺıngua portuguesa, nos últimos dez anos. Inicio o artigo por uma curta perspectiva histórica para explicar o contexto em que a Linguateca surgiu e quais os objectivos iniciais para o progresso da área. Avalio de seguida resumidamente a situação...

متن کامل

Porquê o Págico? Razões para uma avaliação conjunta

Este artigo apresenta a motivação da avaliação conjunta Págico Português Mágico, organizada pela Linguateca em 2011-2012 como uma medida para (i) incentivar o desenvolvimento de sistemas de ajuda à procura de informação em português; (ii) avaliar a wikipédia em português; (iii) estudar a interação humana na procura de respostas, e compará-la com as caracteŕısticas dos sistemas automáticos. Depo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013